Skip to content

Document scaling/concurrency and complete config reference for customer-managed runners#18816

Merged
borisschlosser merged 5 commits intomasterfrom
boris/docs-customer-managed-agents-config-reference
May 7, 2026
Merged

Document scaling/concurrency and complete config reference for customer-managed runners#18816
borisschlosser merged 5 commits intomasterfrom
boris/docs-customer-managed-agents-config-reference

Conversation

@borisschlosser
Copy link
Copy Markdown
Contributor

Summary

  • Add a Scaling and concurrency section to the customer-managed workflow runners page, explaining that each runner process executes one workflow job at a time, that scaling is done by adding runner instances, and that Pulumi Cloud claims each pending job exclusively (so two runners never process the same job simultaneously, with crash recovery once a claim expires). Includes three concrete scaling patterns: long-running replicas, ephemeral single_run jobs, and specialized pools via enabled_workflow_types.
  • Expand the Configuration reference:
    • Add the previously undocumented health_threshold / PULUMI_AGENT_HEALTH_THRESHOLD setting.
    • Add a top note covering the PULUMI_AGENT_* env-var convention, env-over-file precedence, and Go-style duration syntax.
    • Reorganize the YAML block into logical groups (required, OIDC, polling/retry, health/observability).
    • Tighten descriptions on existing keys (e.g. deploy_target, circuit_breaker_failures, single_run, env_forward_allowlist always-forwarded DOCKER_HOST).

Test plan

  • Render the page locally (make serve or equivalent) and confirm formatting.
  • Verify the new "Scaling and concurrency" section renders under "Using customer-managed workflow runners".
  • Verify the YAML configuration block parses cleanly and the new health_threshold entry is visible.

🤖 Generated with Claude Code

…er-managed runners

Adds a "Scaling and concurrency" section explaining that each runner
processes one job at a time, that scaling is done by adding runners,
and that the service guarantees a job is claimed by exactly one
runner. Expands the configuration reference to include the previously
undocumented health_threshold setting, notes the env-var prefix and
precedence rules, and reorganizes the YAML block into logical groups
(required, OIDC, polling/retry, health/observability) with tightened
descriptions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 6, 2026

Docs review

Overall a strong improvement — the new "Scaling and concurrency" section gives readers a concrete mental model, and regrouping the configuration reference by purpose is much easier to scan than the previous flat list. A handful of small consistency/style fixes below.

Issues

Inconsistent terminology — concurrency slot vs concurrent slot (lines 56 and 64)

Line 56 says "one additional concurrency slot"; line 64 says "one additional concurrent slot." Pick one and use it in both places. Since the section is titled "Scaling and concurrency," I'd lean toward "concurrency slot."

= symbol mid-sentence (line 64)

- **Long-running runners**: Run multiple instances (for example, replicas of a Kubernetes Deployment, or several systemd units across hosts). Each replica adds one concurrency slot.

Reads more naturally as prose than Each replica = one additional concurrent slot..

e.g. is inconsistent with the rest of the file (line 178)

Lines 64 and 139 already use "for example." For consistency (and per Google style, which prefers a comma after e.g. regardless):

# If true, the runner exits after completing a single workflow job.
# Useful for ephemeral, one-shot runners (for example, Kubernetes Jobs).

Slash between rate-limited and transient (line 250)

# Maximum number of retries for rate-limited or transient API failures.

A bare / reads ambiguously in prose; "or" is clearer.

Directional reference in YAML comment (line 207)

STYLE-GUIDE.md calls out directional terms like "above." The same information works without it — for example: ## See the "Leveraging OpenID authentication" section.

Nits

  • Line 58: picks the job up reads slightly awkwardly — picks up the job is more natural.
  • Line 60: The sentence uses "concurrency" twice in close succession, once as the limit mechanism and once as the limit value ("will not exceed that organization's configured concurrency"). Consider "configured concurrency limit" for clarity.
  • Line 56: There is no in-process worker pool to size. is a useful disambiguation, but slightly jargon-heavy. Optional rephrase: Each runner has no internal worker pool to configure.

What's working

  • The exclusive-claim and crash-recovery explanation fills a real gap and is well placed.
  • Grouping the YAML by purpose (required / OIDC / polling-retry / health) makes the reference much more navigable.
  • The health_threshold documentation is precise about the auto-derivation default — easy to act on.
  • The OIDC subsection's note that organization_name and runner_pool_id become required is a nice readability win.

If you'd like me to take another look after revisions, mention @claude.

- Use "concurrency slot" consistently and replace "= one additional"
  with prose ("adds one concurrency slot").
- Replace "picks the job up" with "picks up the job".
- Clarify "configured concurrency limit" to disambiguate from the
  earlier mention of concurrency.
- Soften "no in-process worker pool to size" to "Each runner has no
  internal worker pool to configure".
- Replace "e.g." with "for example" for consistency with the rest of
  the page.
- Replace "/" with "or" in the retry comment for clearer prose.
- Drop the directional reference "above" in the OIDC section header.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@borisschlosser
Copy link
Copy Markdown
Contributor Author

@claude review again

@pulumi-bot
Copy link
Copy Markdown
Collaborator

pulumi-bot commented May 6, 2026

Claude finished @borisschlosser's task in 1m 9s —— View job


Docs review (follow-up)

All issues from the prior review have been addressed:

# Item Status
1 "concurrency slot" / "concurrent slot" inconsistency ✅ Both now use "concurrency slot" (lines 56, 64)
2 = symbol mid-sentence on the long-running runners bullet ✅ Now reads "Each replica adds one concurrency slot." (line 64)
3 e.g. inconsistent with "for example" elsewhere ✅ Now uses "for example" in the single_run comment (line 178)
4 rate-limited / transient slash ✅ Now "rate-limited or transient" (line 250)
5 Directional reference ("above") in YAML comment ✅ Reworded to "See the 'Leveraging OpenID authentication' section." (line 207)
6 "picks the job up" ✅ Now "picks up the job" (line 58)
7 "configured concurrency" → "configured concurrency limit" ✅ Line 60
8 "no in-process worker pool to size" jargon ✅ Now "Each runner has no internal worker pool to configure." (line 56)

One small leftover I noticed on second pass

Line 56 reads:

Each workflow runner process executes one workflow job at a time. Each runner has no internal worker pool to configure.

"Each runner" appears twice in two short sentences. Optional tightening — collapse to a single sentence:

Each workflow runner process executes **one workflow job at a time** and has no internal worker pool to configure. To increase the number of jobs your pool can run in parallel, add more workflow runner instances to the pool — each instance contributes one additional concurrency slot.

Not blocking.

What's working

  • New "Scaling and concurrency" section is well-placed and gives a clear mental model (one job per process → scale by adding instances → server enforces the cap).
  • The OIDC subsection's note that organization_name/runner_pool_id become required is a nice readability win.
  • Reorganizing the YAML by purpose (required / OIDC / polling-retry / health) is a real navigation upgrade.
  • The health_threshold doc precisely describes the auto-derivation default.

LGTM modulo the optional nit above.
• Branch: boris/docs-customer-managed-agents-config-reference

@pulumi-bot
Copy link
Copy Markdown
Collaborator

pulumi-bot commented May 6, 2026

@borisschlosser
Copy link
Copy Markdown
Contributor Author

@claude collaps the "each runner" sentences to one as suggested by yourself

@pulumi-bot
Copy link
Copy Markdown
Collaborator

pulumi-bot commented May 6, 2026

Claude finished @borisschlosser's task in 29s —— View job


Done

  • Read current state of the file
  • Collapse the two "Each runner" sentences into one
  • Commit and push (4526f1a)

Line 56 now reads as a single sentence:

Each workflow runner process executes one workflow job at a time and has no internal worker pool to configure. To increase the number of jobs your pool can run in parallel, add more workflow runner instances to the pool — each instance contributes one additional concurrency slot.
• Branch: boris/docs-customer-managed-agents-config-reference

github-actions Bot and others added 2 commits May 6, 2026 10:20
Co-authored-by: Boris Schlosser <borisschlosser@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@CamSoper CamSoper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — the YAML reorganization and the new "Scaling and concurrency" section are real readability wins, and I spot-checked the defaults and the health_threshold derivation against the agent source; both match.

Three small things you might want to fold into a follow-up (none are blockers):

  1. Line 149# Required unless using OIDC (see oidc_token_file below). is a directional reference. The earlier review pass caught the same pattern with "above" on line 207 and replaced it with a section name. Easiest fix here is to drop "below" entirely: # Required unless using OIDC. reads fine since oidc_token_file is named again a few lines later.

  2. Line 208 — backtick consistency in the OIDC group header:

## organization_name and runner_pool_id are required, and `token` is not used.

...puts backticks on token but not on the other keys. Either backtick all or none.

  1. Line 99 (outside this PR's diff) — the OIDC prose bullet still says token_expiration is "the expiration in seconds for the tokens requested by the workflow runner," but the agent reads it via GetDuration (config.go:154), and the new general note at line 141 establishes Go duration syntax across the board. Worth tightening to something like "the lifetime of tokens requested by the workflow runner (Go duration syntax, e.g. `1h`)" so the page tells one consistent story.

Address review feedback on the customer-managed workflow runners doc:
clarify that each runner process can run one deployment plus one
Insights/policy job in parallel, split crash-recovery behavior by
workflow type (lease-based for Insights/policy, not redelivered for
deployments), and tighten config-reference nits (drop directional
"below", backtick OIDC group keys consistently, restate
token_expiration in Go duration terms).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@borisschlosser borisschlosser merged commit c76efbb into master May 7, 2026
7 checks passed
@borisschlosser borisschlosser deleted the boris/docs-customer-managed-agents-config-reference branch May 7, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants